Cache write generate for parallel image processing on shared memory architectures
نویسندگان
چکیده
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
منابع مشابه
Cache Write Generate For High-Performance Processing
Much attention has been paid to read caching and several schemes have been developed to make read caching very e cient. As a result, the performance of write caching has become a concern. This paper investigates write caching policies and how they a ect the performance of memory systems. We show that write caching can greatly alter the hit/miss ratios, but only more subtly a ects the performanc...
متن کاملDesign of a Simulator for Large-Scale Distributed Shared-Memory Cache-Coherent Architectures
As the scale and the complexity of parallel computer systems grow rapidly, the study of interactions between application algorithms and parallel architectures becomes more important. Execution-driven simulation under realistic workloads proves to be an accurate and eecient technique for studying the performance of computer systems. However, direct-execution simulation of shared-memory cache-coh...
متن کاملParallel Conventional Systems versus Parallel Logic Programming Systems on Distributed Shared Memory Architectures
Distributed shared memory architectures have been object of research by many computer science groups. Research goes broadly from hardware based coherence protocols to DSM software protocols on networks of workstations passing through high technology interconnection networks that reduce network latency. In this work we thoroughly investigate how diierent hardware cache coherence protocols aaect ...
متن کاملMemory Latency in Distributed Shared-Memory Multiprocessors
Analytical models were developed and simulations of memory latency were performed for Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), Local-Remote-Global (LRG), and Replicated Concurrent-Read ( R C R ) architectures for hit rates from 0.1 to 0.9 in steps of 0.1, memory access times of 10 nsec to 100 nsec, proportions of read/write access from 0.01 to 0.1, and block sizes of 8 to ...
متن کاملParallel Processing Using the Silicon Graphics / Cray Origin 2000
The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effective...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
دوره 5 7 شماره
صفحات -
تاریخ انتشار 1996